New Hash Function Construction for Textual and Geometric Data Retrieval

نویسندگان

  • Václav Skala
  • Jan Hrádek
  • Martin Kuchař
چکیده

Techniques based on hashing are heavily used in many applications, e.g. information retrieval, geometry processing, chemical and medical applications etc. and even in cryptography. Traditionally the hash functions are considered in a form of h(v) = f(v) mod m, where m is considered as a prime number and f(v) is a function over the element v, which is generally of „unlimited“ dimensionality and/or of „unlimited“ range of values. In this paper a new approach for a hash function construction is presented which offers unique properties for textual and geometric data. Textual data have a limited range of values (the alphabet size) and „unlimited“ dimensionality (the string length), while geometric data have „unlimited“ range of values (usually (-∞, ∞) ), but limited dimensionality (usually 2 or 3). Construction of the hash function differs for textual and geometric data and the proposed hash construction has been verified on non-trivial data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Approach for Textual and Geometrical Information Retrieval

Textual and geometrical algorithms have been considered as two separate fields. This was caused by the fact that textual data are discrete in principal and interpolation is not defined as there is no metric in general, while geometrical data are considered discrete samples of continuous phenomena, geometrical surface etc. In this paper we present a unified approach to textual and geometrical da...

متن کامل

An Improved Hash Function Based on the Tillich-Zémor Hash Function

Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.

متن کامل

Identifying and Ranking the Important Textual and Paratextual Elements in Fiction Retrieval

Purpose: The purpose of this study is to identify the textual and paratextual elements in retrieving fiction from the readers’ perspective in order to provide the most appropriate access points for the readers and to improve access to fictions based on the readers’ needs. Method: The current research is an applied study in terms of purpose, applying a mixed method that was conducted using the ...

متن کامل

Efficient Hash Function for Duplicate Elimination in Dictionaries

Fast elimination of duplicate data is needed in many areas, especially in the textual data context. A solution to this problem was recently found for geometrical data using a hash function to speed up the process. The usage of the hash function is extremely efficient when incremental elimination is required especially for processing large data sets. In this paper a new construction of the hash ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010